# Install chemprop
# CoLab has already preinstalled Pytorch for you
! pip install chemprop rdkit
# Download ESOL data
! mkdir data/
! wget https://raw.githubusercontent.com/schwallergroup/ai4chem_course/main/notebooks/02%20-%20Supervised%20Learning/data/esol.csv -O data/esol.csv
8 Week 3 tutorial 2 - AI 4 Chemistry
Table of content
- Relevant packages
- Train GNNs using chemprop
0. Relevant packages
Chemprop
Chemprop package contains message passing neural networks for molecular property prediction as described in the paper Analyzing Learned Molecular Representations for Property Prediction and as used in the paper A Deep Learning Approach to Antibiotic Discovery for molecules and Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction for reactions.
Documentation: Full documentation of Chemprop is available at https://chemprop.readthedocs.io/en/latest/.
Website: A web prediction interface with some trained Chemprop models is available at chemprop.csail.mit.edu.
Tutorial: These slides provide a Chemprop tutorial and highlight recent additions as of April 28th, 2020.
Set a random seed to ensure repeatability of experiments
import random
import numpy as np
import torch
# Random Seeds and Reproducibility
0)
torch.manual_seed(0)
torch.cuda.manual_seed(0)
np.random.seed(0) random.seed(
1. Train GNNs using chemprop
To train a GNN model, run:
chemprop_train --data_path <path> --dataset_type <type> --save_dir <dir>
where <path>
is the path to a CSV file containing a dataset, <type>
is one of [classification, regression, multiclass, spectra] depending on the type of the dataset, and <dir>
is the directory where train results and model checkpoints will be saved. For more details for CSV data style, please see here.
For example:
chemprop_train --data_path data/tox21.csv --dataset_type classification --save_dir tox21_checkpoints
A full list of available command-line arguments can be found in chemprop/args.py.
For model evaluation metrics, please see in README.md.
! chemprop_train --data_path data/esol.csv \
--dataset_type regression \
--save_dir esol_ckpts \
--metric rmse \
--split_sizes 0.7 0.1 0.2 \
--epochs 60